Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: transaction caching #252

Merged
merged 32 commits into from
Feb 6, 2024

Conversation

wadeking98
Copy link
Contributor

@wadeking98 wadeking98 commented Jan 18, 2024

Added functionality to cache read requests to the ledger.
I've provided added functionality to store the cache in memory or on disk in either a TTL or a LRU style cache. I've left the implementation extendable so users can easily implement storing the cache in a db, redis, whatever.

Demo:

The cache will keep values as valid for 24 hours, add the --use-cache flag to use the caching functionality. If the cache path isn't supplied then it will use a cache in memory instead of on the file system. If cache size is not specified then it will use a cache with size 1000
./target/debug/indy-vdr-proxy -g ./genesis.txn --use-cache --cache-size 1024 --cache-path /dev/shm/vdr-cache -p 3000

Cache:

cache

No Cache:

no-cache

With caching enabled on a request to lookup a schema, it takes under 32 milliseconds and is sometimes as fast as 7 milliseconds. Compared with when no cache is enabled, the same request will take over 135 milliseconds to almost a second

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@swcurran
Copy link
Member

For all but RevRegEntries, expiring the cache is not needed since the transactions are immutable. Would it make sense (is it possible?) have a fixed (maximum) sized cache with an LRU expiration policy?

Just as a matter of interest, what is the approach on RevRegEntries for finding cache hits? It seems unlikely there would be cache hits for RevRegEntry transactions, since the combination of from, to intervals would have to be the same.

@wadeking98
Copy link
Contributor Author

wadeking98 commented Jan 18, 2024

For all but RevRegEntries, expiring the cache is not needed since the transactions are immutable. Would it make sense (is it possible?) have a fixed (maximum) sized cache with an LRU expiration policy?

Just as a matter of interest, what is the approach on RevRegEntries for finding cache hits? It seems unlikely there would be cache hits for RevRegEntry transactions, since the combination of from, to intervals would have to be the same.

I like the idea of not having a time based expiration, the only thing with the LRU approach is that cache get request will have to mutate the cache as well (to mark an entry as recently used). I don't really have a problem with this but it might look a little weird for end users if the get cache requests mutate things. Still, I think it's a good idea.

In terms of the revocation registry, I think it's still a good idea to cache. For example, in the wallet the user may open a proof request multiple times, or select an alternate credential for the proof which would fetch the revocation registry entry. I think the wallet would benefit from having RevRegEntry transactions cached

@andrewwhitehead
Copy link
Member

andrewwhitehead commented Jan 18, 2024

You could add a TTL and maximum number of items, then prune the cache (removing expired items first) when adding new items. Objects like credential definitions are treated as immutable, but in reality can potentially be updated on the ledger (depending on the access controls).

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@wadeking98
Copy link
Contributor Author

I could add support for both and then the end user can decide which one to use, or do you think LRU should be avoided @andrewwhitehead ?

@andrewwhitehead
Copy link
Member

I think the only issue with straight LRU is that commonly used entries might never expire, which means they would never be re-fetched. Also, the get/insert/remove methods of the cache implementation don't necessarily need to take &mut self, because locking the Mutex gives you a mutable reference anyway.

@swcurran
Copy link
Member

I think the fetching patterns vary by object type. Is it worth worrying about that?

Specifically:

  • DIDs, ATTRIBs can be “updated” (e.g. a new instance written, and on read the latest is returned) so they do need a “moderate” TTL on them.
    • That said, for DIDs at least, we rarely need them, so reads are likely relatively rare.
  • CredDefs and RevRegs don’t ever expire, so can have a very long TTL
  • RevRegStatus requests (they aren’t ledger objects) must be cached using their query parameters (from, to) and as such should have a short TTL, as getting them out of the cache fairly quickly is probably a good idea. Every time a new presentation is generated, the query is likely to be different.

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@wadeking98
Copy link
Contributor Author

I think the only issue with straight LRU is that commonly used entries might never expire, which means they would never be re-fetched. Also, the get/insert/remove methods of the cache implementation don't necessarily need to take &mut self, because locking the Mutex gives you a mutable reference anyway.

Got it, I've left the LRU cache there just in case. I've implemented the TTL caching functionality as well. I think that I need to keep insert and remove mutable because I've replaced the mutex on the cache with a RwLock. I only had it as a mutex when the LRU cache was the only implementation we had

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@wadeking98
Copy link
Contributor Author

wadeking98 commented Jan 19, 2024

I think the fetching patterns vary by object type. Is it worth worrying about that?

Specifically:

* DIDs, ATTRIBs can be “updated” (e.g. a new instance written, and on read the latest is returned) so they do need a “moderate” TTL on them.
  
  * That said, for DIDs at least, we rarely need them, so reads are likely relatively rare.

* CredDefs and RevRegs don’t ever expire, so can have a very long TTL

* RevRegStatus requests (they aren’t ledger objects) must be cached using their query parameters (from, to) and as such should have a short TTL, as getting them out of the cache fairly quickly is probably a good idea.  Every time a new presentation is generated, the query is likely to be different.

I think this is a good idea... but I think may be complex enough to deserve it's own PR

@andrewwhitehead
Copy link
Member

I think that I need to keep insert and remove mutable because I've replaced the mutex on the cache with a RwLock

The same is true for RwLock. The contents of an Arc are not generally writable, except in cases like this where a write lock is obtained.

If the TTL varies by transaction type then it might be best for the caller to specify what that is in each request? Although we could also add some defaults.

@andrewwhitehead
Copy link
Member

One important case to test would be a GET_TXN for an undefined (not yet published) sequence number. This can return a response with a 'null' value which we do not want to cache.

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@wadeking98
Copy link
Contributor Author

I think that I need to keep insert and remove mutable because I've replaced the mutex on the cache with a RwLock

The same is true for RwLock. The contents of an Arc are not generally writable, except in cases like this where a write lock is obtained.

If the TTL varies by transaction type then it might be best for the caller to specify what that is in each request? Although we could also add some defaults.

I was thinking the TTL storage type could handle that, maybe in the new() function the user passes in a default ttl along with a list of enums that match the request type with an associated type of u64 or something that determines the ttl for that request type

@wadeking98
Copy link
Contributor Author

wadeking98 commented Jan 29, 2024

@andrewwhitehead I've updated the cache functionality. I tried your idea of having one cache per pool, but we still run into the issue with potential conflicts between ledgers that have the same genesis transactions. Additionally I realized that it would be difficult to clean up the fs cache in this scenario. Say a ledger updates it's genesis file, the pool would then have a new ID and a new cache with a different fs location. The old cache would stay on the file system and wouldn't be cleaned up.

I decided to use a global cache strategy and storage and then initialize a new cache instance with the pool genesis transactions as a key prefix. This way we don't have to worry about cache conflicts between different ledgers (so long as they have a different genesis), but we also don't have to worry about managing multiple locations on the file system

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@andrewwhitehead
Copy link
Member

Getting pretty close now I think. The solution for handling multiple ledgers should work, it's not the end of the world if the cache gets expired when the pool transactions are changed.

We may want to swap out sled later as it seems to be going without maintenance at the moment, but it's fine for now. fjall could be a good option? I'd like to find a key-value DB with random element access (or add that feature) so that this could be implemented for the cache eviction: https://danluu.com/2choices-eviction/

Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
libindy_vdr/include/libindy_vdr.h Outdated Show resolved Hide resolved
@@ -130,6 +130,12 @@ impl From<zmq::Error> for VdrError {
}
}

impl From<sled::Error> for VdrError {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can leave this as-is for now, but I think we can use #[from(sled::Error)] from thiserror and it will derive it properly.

libindy_vdr/src/pool/cache/mod.rs Outdated Show resolved Hide resolved
libindy_vdr/src/pool/cache/mod.rs Show resolved Hide resolved
wrappers/javascript/indy-vdr-nodejs/src/NodeJSIndyVdr.ts Outdated Show resolved Hide resolved
wrappers/javascript/indy-vdr-react-native/cpp/indyVdr.cpp Outdated Show resolved Hide resolved
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
Signed-off-by: wadeking98 <wkingnumber2@gmail.com>
@wadeking98
Copy link
Contributor Author

I'll need @berendsliedrecht or @andrewwhitehead to merge since I don't have write access to the repo

@andrewwhitehead andrewwhitehead merged commit f9f9752 into hyperledger:main Feb 6, 2024
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants